A Supervised Statistical Data Quantization Method in Machine Learning

نویسندگان

  • Jing Wang
  • Haohan Xie
  • Xiangzheng Zhao
  • Ye Liang
  • Xihang Zhang
  • Yajun Zhang
چکیده

Data quantization methods for continuous attributes play an extremely important role in artificial intelligence, data mining and machine learning because discrete values of attributes are required in most classification methods. In this paper, we present a supervised statistical data quantization method. It defines a quantization criterion based on the chi-square statistic to discover accurate merging intervals. In addition, a heuristic quantization algorithm is proposed to achieve a satisfying quantization result with the aim to improve the performance of inductive learning algorithms. Empirical experiments on UCI real data sets show that our proposed algorithm generates a better quantization scheme that improves the classification accuracy of C4.5 decision tree than existing algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Border sensitive fuzzy vector quantization in semi-supervised learning

Abstract. We propose a semi-supervised fuzzy vector quantization method for the classification of incompletely labeled data. Since information contained within the structure of the data set should not be neglected, our method considers the whole data set during the learning process. In difference to known methods our approach uses neighborhood cooperativeness for stable prototype learning known...

متن کامل

Supervised Competition Using Joined Growing Neural Gas

Competitive learning is well-known method to process data. Various goals may be achieved using competitive learning such as classification or vector quantization. In this paper, we present a different insight into the principle of supervised competitive learning. An innovative approach to the supervised self-organization is suggested. The method is based on different handling of input data labe...

متن کامل

Emotion Detection in Persian Text; A Machine Learning Model

This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Statistical machine learning for data mining and collaborative multimedia retrieval

of thesis entitled: Statistical Machine Learning for Data Mining and Collaborative Multimedia Retrieval Submitted by HOI, Chu Hong (Steven) for the degree of Doctor of Philosophy at The Chinese University of Hong Kong in September 2006 Statistical machine learning techniques have been widely applied in data mining and multimedia information retrieval. While traditional methods, such as supervis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Multimedia

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013